Detecting Anomalous Groups in Categorical Datasets

نویسندگان

  • Kaustav Das
  • Jeff Schneider
  • Daniel B. Neill
چکیده

We propose a new method for detecting groups of anomalies in categorical datasets. Our approach is a generalization of the spatial scan statistic, a commonly used method for detecting clusters of increased counts in spatial data. We extend this framework to non-spatial datasets with discrete valued attributes, where the degree of anomalousness of each record depends on its attribute values and we wish to find self-similar groups of anomalous records. We model the relationship between the attributes using a probabilistic model (e.g. Bayesian network), define a likelihood ratio statistic in terms of the pseudo-likelihoods for the null and alternative hypotheses, and maximize this statistic over all subsets of records. Since an exhaustive search over all such groups is computationally infeasible, we propose an efficient (but approximate) search heuristic. We show that this algorithm is able to accurately detect anomalous groups in real-world hospital, container shipping and network connections data. This publication was supported in part by Grant Number 8-R01-HK000020-02 from CDC and by NSF under award IIS-0325581.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Patterns of Anomalies

An anomaly is an observation that does not conform to the expected normal behavior. With the ever increasing amount of data being collected universally, automatic surveillance systems are becoming more popular and are increasingly using data mining methods to detect patterns of anomalies. Detecting anomalies can provide useful and actionable information in a variety of real-world scenarios. For...

متن کامل

MultiAspectSpotting: Spotting Anomalous Behavior within Count Data Using Tensor

Methods for finding anomalous behaviors are attracting much attention, especially for very large datasets with several attributes with tens of thousands of categorical values. For example, security engineers try to find anomalous behaviors, i.e., remarkable attacks which greatly differ from the day’s trend of attacks, on the basis of intrusion detection system logs with source IPs, destination ...

متن کامل

Detecting Extreme Rank Anomalous Collections

Anomaly or outlier detection has a wide range of applications, including fraud and spam detection. Most existing studies focus on detecting point anomalies, i.e., individual, isolated entities. However, there is an increasing number of applications in which anomalies do not occur individually, but in small collections. Unlike the majority, entities in an anomalous collection tend to share certa...

متن کامل

Separation Between Anomalous Targets and Background Based on the Decomposition of Reduced Dimension Hyperspectral Image

The application of anomaly detection has been given a special place among the different   processings of hyperspectral images. Nowadays, many of the methods only use background information to detect between anomaly pixels and background. Due to noise and the presence of anomaly pixels in the background, the assumption of the specific statistical distribution of the background, as well as the co...

متن کامل

Detecting Anomalous Sensor Events in Smart Home Data for Enhancing the Living Experience

The need to have a secure lifestyle at home is in demand more than ever. Today’s home is more than just four walls and a roof. Technology at home is on the rise and the place for smart home solutions is growing. One of the major concerns for smart home systems is the capability of adapting to the user. Personalizing the behavior of the home may provide improved comfort, control, and safety. One...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009